functional regularisation
Continual Deep Learning by Functional Regularisation of Memorable Past
Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting. By using a Gaussian Process formulation of deep networks, our approach enables training in weight-space while identifying both the memorable past and a functional prior. Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memory-based methods are naturally combined.
Continual Deep Learning by Functional Regularisation of Memorable Past
Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting. By using a Gaussian Process formulation of deep networks, our approach enables training in weight-space while identifying both the memorable past and a functional prior.
Review for NeurIPS paper: Continual Deep Learning by Functional Regularisation of Memorable Past
All four expert reviewers were positive about this work, and the author rebuttal along with a lively post-rebuttal discussion improved the opinions. I agree with the reviewers that this is a high quality paper and my decision is to accept. I encourage the authors to take reviewer suggestions into account -- especially the promise to provide longer task sequences and the discussion of connections between gradient-based sample selection and the proposed memorable sample selection approach.
Review for NeurIPS paper: Continual Deep Learning by Functional Regularisation of Memorable Past
What are the real contributions of the paper? The idea of regularizing the outputs (or functional-regularization) has already been explored, as already said in the paper. Combining the idea of regularizing the outputs with memory-based methods is also already explored. Please see GEM [1] and A-GEM [2]. What makes this approach better or important, e.g.
Continual Deep Learning by Functional Regularisation of Memorable Past
Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting. By using a Gaussian Process formulation of deep networks, our approach enables training in weight-space while identifying both the memorable past and a functional prior.
Functional Regularisation for Continual Learning using Gaussian Processes
Titsias, Michalis K., Schwarz, Jonathan, Matthews, Alexander G. de G., Pascanu, Razvan, Teh, Yee Whye
We introduce a novel approach for supervised continual learning based on approximate Bayesian inference over function space rather than the parameters of a deep neural network. We use a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Functional regularisation for continual learning naturally arises by applying the variational sparse GP inference method in a sequential fashion as new tasks are encountered. At each step of the process, a summary is constructed for the current task that consists of (i) inducing inputs and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms that appear in the variational lower bound, and reduces the effects of catastrophic forgetting. We fully develop the theory of the method and we demonstrate its effectiveness in classification datasets, such as Split-MNIST, Permuted-MNIST and Omniglot.
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Research Report (1.00)
- Workflow (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)